personality prompt
VelLMes: A high-interaction AI-based deception framework
Sladić, Muris, Valeros, Veronica, Catania, Carlos, Garcia, Sebastian
There are very few SotA deception systems based on Large Language Models. The existing ones are limited only to simulating one type of service, mainly SSH shells. These systems - but also the deception technologies not based on LLMs - lack an extensive evaluation that includes human attackers. Generative AI has recently become a valuable asset for cybersecurity researchers and practitioners, and the field of cyber-deception is no exception. Researchers have demonstrated how LLMs can be leveraged to create realistic-looking honeytokens, fake users, and even simulated systems that can be used as honeypots. This paper presents an AI-based deception framework called VelLMes, which can simulate multiple protocols and services such as SSH Linux shell, MySQL, POP3, and HTTP. All of these can be deployed and used as honeypots, thus VelLMes offers a variety of choices for deception design based on the users' needs. VelLMes is designed to be attacked by humans, so interactivity and realism are key for its performance. We evaluate the generative capabilities and the deception capabilities. Generative capabilities were evaluated using unit tests for LLMs. The results of the unit tests show that, with careful prompting, LLMs can produce realistic-looking responses, with some LLMs having a 100% passing rate. In the case of the SSH Linux shell, we evaluated deception capabilities with 89 human attackers. The results showed that about 30% of the attackers thought that they were interacting with a real system when they were assigned an LLM-based honeypot. Lastly, we deployed 10 instances of the SSH Linux shell honeypot on the Internet to capture real-life attacks. Analysis of these attacks showed us that LLM honeypots simulating Linux shells can perform well against unstructured and unexpected attacks on the Internet, responding correctly to most of the issued commands.
- Europe > Czechia > Prague (0.05)
- Asia > Singapore (0.04)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- South America > Argentina > Cuyo > Mendoza Province > Mendoza (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.34)
AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents
Naik, Akshat, Quinn, Patrick, Bosch, Guillermo, Gouné, Emma, Zabala, Francisco Javier Campos, Brown, Jason Ross, Young, Edward James
As Large Language Model (LLM) agents become more widespread, associated misalignment risks increase. While prior research has studied agents' ability to produce harmful outputs or follow malicious instructions, it remains unclear how likely agents are to spontaneously pursue unintended goals in realistic deployments. In this work, we approach misalignment as a conflict between the internal goals pursued by the model and the goals intended by its deployer. We introduce a misalignment propensity benchmark, \textsc{AgentMisalignment}, a benchmark suite designed to evaluate the propensity of LLM agents to misalign in realistic scenarios. Evaluations cover behaviours such as avoiding oversight, resisting shutdown, sandbagging, and power-seeking. Testing frontier models, we find that more capable agents tend to exhibit higher misalignment on average. We also systematically vary agent personalities through different system prompts and observe that persona characteristics can strongly and unpredictably influence misalignment, sometimes more than the choice of model itself. Our results reveal the limitations of current alignment methods for autonomous LLM agents and underscore the need to rethink misalignment in realistic deployment settings.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.92)
- Government (0.67)
PHAnToM: Personality Has An Effect on Theory-of-Mind Reasoning in Large Language Models
Tan, Fiona Anting, Yeo, Gerard Christopher, Wu, Fanyou, Xu, Weijie, Jain, Vinija, Chadha, Aman, Jaidka, Kokil, Liu, Yang, Ng, See-Kiong
Recent advances in large language models (LLMs) demonstrate that their capabilities are comparable, or even superior, to humans in many tasks in natural language processing. Despite this progress, LLMs are still inadequate at social-cognitive reasoning, which humans are naturally good at. Drawing inspiration from psychological research on the links between certain personality traits and Theory-of-Mind (ToM) reasoning, and from prompt engineering research on the hyper-sensitivity of prompts in affecting LLMs capabilities, this study investigates how inducing personalities in LLMs using prompts affects their ToM reasoning capabilities. Our findings show that certain induced personalities can significantly affect the LLMs' reasoning capabilities in three different ToM tasks. In particular, traits from the Dark Triad have a larger variable effect on LLMs like GPT-3.5, Llama 2, and Mistral across the different ToM tasks. We find that LLMs that exhibit a higher variance across personality prompts in ToM also tends to be more controllable in personality tests: personality traits in LLMs like Figure 1: Overview of PHAnToM. Our work investigates GPT-3.5, Llama 2 and Mistral can be controllably how eight different personality prompts (Big Five adjusted through our personality prompts. OCEAN and Dark Triad) affects LLMs' ability to perform In today's landscape where role-play is a common three theory-of-mind reasoning tasks (Information strategy when using LLMs, our research Access (IA), Answerability (AA), and Belief Understanding highlights the need for caution, as models that (BU)).
- Asia > Singapore (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)